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Abstract —Proliferation of cloud computing has revolution¬ 
ized hosting and delivery of Internet-based application services. 
However, with the constant launch of new cloud services and 
capabilities almost every month by both big (e.g., Amazon Web 
Service, Microsoft Azure) and small companies (e.g. Rackspace, 
Ninefold), decision makers (e.g. application developers, CIOs) 
are likely to be overwhelmed by choices available. The decision 
making problem is further complicated due to heterogeneous 
service configurations and application provisioning Quality of 
Service (QoS) constraints. To address this hard challenge, in our 
previous work we developed a semi-automated, extensible, and 
ontology-based approach to infrastructure service discovery and 
selection based on only design time constraints (e.g., renting cost, 
datacentre location, service feature, etc.). In this paper, we extend 
our approach to include the real-time (run-time) QoS (end- 
to-end message latency, end-to-end message throughput) in the 
decision making process. Hosting of next generation applications 
in domain of on-line interactive gaming, large scale sensor 
analytics, and real-time mobile applications on cloud services 
necessitates optimization of such real-time QoS constraints for 
meeting Service Level Agreements (SLAs). To this end, we present 
a real-time QoS aware multi-criteria decision making technique 
that builds over well known Analytics Hierarchy Process (AHP) 
method. The proposed technique is applicable to selecting In¬ 
frastructure as a Service (IaaS) cloud offers, and it allows users 
to define multiple design-time and real-time QoS constraints or 
requirements. These requirements are then matched against our 
knowledge base to compute possible best fit combinations of cloud 
services at IaaS layer. We conducted extensive experiments to 
prove the feasibility of our approach. 

Index Terms —Decision support, Optimization, Service Selec¬ 
tion, Web-based services 

I. Introduction 

In the cloud computing model, users access services ac¬ 
cording to their requirements, without the need to know where 
the services are hosted or how they are delivered. Increasing 
number of IT vendors (Amazon, GoGrid and Rackspace) 
are promising to offer applications, storage and computation 
resources as cloud hosting services. As a result, a large 
number of competing services are available for users d to 
choose from. Naturally, it is challenging for users to select 
the right services that meet their QoS requirements in the 
service cycle from selection, deployment to orchestration (e.g. 
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determine optimal web service when making service selection, 
identify suitable virtual machine servers for deploying web 
service instances, etc.) m . Effective service recommendation 
techniques are becoming important to help users (including 
developers) in their decision-making processes for critical ap¬ 
plication developments and deployments ED. Such applications 
can include interactive games, real-time social networks, data 
analytics, scientific computing, business, Internet of Things 
(IoT) and other mobile applications as discussed next. All 
these applications have different needs and requirements. 

A. Motivation 

We next provide a few examples to demonstrate different 
types of applications with the needs to cater for real-time QoS 
requirements during their deployment lifecycle. 

Interactive Online Games: In the gaming industry, World 
of Warcraft counts over six million unique players on daily 
basis. The operating infrastructure of this Massively Mul¬ 
tiplayer Online Role Playing Game (MMORPG) comprises 
more than 10,000 computers (4). Depending on the game, 
typical response times to ensure fluent play must remain below 
100 milliseconds in online First Person Shooter (FPS) action 
games and below 1-2 seconds for Role-Playing Games 
(RPGs). A good game experience is critical for keeping the 
players engaged, and has an immediate consequence on the 
earnings and popularity of the game operators. Failing to 
deliver timely simulation updates leads to a degraded game 
experience and triggers player departure and account closures 
l6l . Startup gaming company with no existing infrastructure 
could launch a new game using public cloud infrastructure as 
cloud services offers the flexibility to scale on demand with no 
upfront investment. Using cloud services, the game application 
services can be dynamically allocated or de-allocated accord¬ 
ing to demand fluctuations. Game companies can also better 
serve the diverse international users with the global presence 
of data centers owned by Cloud providers. 

Real-time Mobile applications: There is an explosion of 
(primarily mobile based) communication apps. For example, 
WhatsApp, acquired by Facebook, has 450 million users Eh 
Viber, acquired by Rakuten, has 200 million users El) and 
WeChat, a Chinese rival, has 270 million users a. For these 
apps, low latency (a QoS constraint) is very important for 
the real time collaboration experience. For example, video 
conferencing, has a limit of about 200 to 250 milliseconds 
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delay for a conversation to appear natural ED. These apps 
have similar requirements as the game apps. They require 
large number of servers to support millions of users, need 
optimization on latency, speed and throughput. It’s worth 
mentioning that even for a generic web application, there 
are experiments with delaying the page in increments of 100 
milliseconds and found that even very small delays would 
result in substantial and costly drops in revenue Col. 

Big Data, IoT (Internet of Things) and eScience: We are 
closing in on the transfer of a zettabyte of data annually CD, 
resulting from internet search, social media, business transac¬ 
tions, and content distribution. Similarly, scientific disciplines 
increasingly produce, process, and visualize data sets gathered 
from sensors m . If the prediction holds true, then the Square 
Kilometer Array (SKA) radio telescopes will transmit 400,000 
petabytes (^400 exabytes) per month or a whopping 155.7 
terabytes per second E3. Furthmore, European Space Agency 
(ESA) will launch several satellites in the next few years CD, 
which will collect data about the environment, such as air 
temperatures and soil conditions, and stream that data back 
in real time for analysis. Similarly in the finance industry, 
New York Stock Exchange creates 1 terabyte of market 
and reference data per day covering the use and exchange 
of financial instruments. On the other hand, Twitter feeds 
generate 8 terabytes of data per day of social interactions 
EES). Such “Data Explosions” has led to research issues such 
as: how to effectively and optimally manage and analyze such 
large amount of data. The issue is also known as the Big Data’ 
problem m, which is defined as the practice of collecting 
complex data sets so large that it becomes difficult to analyze 
and interpret manually or using on-hand data management 
applications (e.g., Microsoft Excel). As both storing and 
analyzing the data requires massive amount of storage capacity 
and processing power. Companies and/or institutions may want 
to offload the complexity of managing hardware infrastructure 
to Cloud providers who are specialized in that, plus eliminating 
the need to wait for facilities to be built. 

Other: Apart from the above mentioned scenarios, there are 
many more cases our proposed solution would be useful. 

A stock investor, individual or firm, may want to test out 
a new strategy for monitoring analyzing data which automat¬ 
ically triggers alert when certain price pattern or keyword is 
identified in the source data. This may require a lot of compute 
resources periodically. System administrators and developers 
may need a lot of simulated clients from all around the world 
for a website load testing before its official release. 

A bitcoin G3 (or some other similar cryptocurrencies EH) 
miner may decide to invest on some additional resource in 
mining when the price of the currency is high, and stop the 
mining when the profit does not justify the expense anymore. 

B. The Problem 

While the elastic nature of cloud services makes it suitable 
for provisioning aforementioned applications, the heterogene¬ 
ity of cloud service configurations and their distributed nature 
raises some serious technical challenges. In particular, we deal 
with following research problems: 


Selecting Optimal Service Configuration: The cloud 
computing landscape is evolving with multiple and diverse 
options for compute (also known as virtual machines) and 
storage services. Hence, application owners are facing a 
daunting task when trying to select cloud services that can 
meet their constraints. According to Burstorm EGO there are 
over 426 of various compute and storage service providers 
with deployments in over 11,072 locations. Even within a 
particular provider there are different variations of the services. 
For example, Amazon Web Service (AWS) has 674 different 
offerings differentiated by price, QoS features and location 
ffl . Add to this every quarter they add about 4 new services, 
change business models (price and terms) and sometimes even 
add new locations. To be able to select the best mix of 
service offering from an abundance of possibilities, application 
owners must simultaneously consider and optimize complex 
dependencies and heterogeneous sets of criteria (price, fea¬ 
tures, location, QoS etc.). For instance, it’s not enough to just 
select optimal cloud storage service, corresponding computing 
capabilities are essential to guarantee that one is able to 
process the data as fast as possible while minimizing the cost. 

Incorporating Network QoS-awareness in Service Selec¬ 
tion Process: As the cloud data centers are distributed across 
the Internet, the network QoS (data transfer latency) varies. 
This variation is dependent upon the location of data center 
and location of input data stream. Current approaches do not 
differentiate between the QoS of compute and storage services 
and the QoS of the wide area network that interconnects 
input data stream sources to cloud data centers. This raises 
a research question: how to optimize the process of choosing 
the best compute and storage services, which are not only 
optimized in terms of price, availability, processing speed but 
also offers good QoS (e.g. network throughput and response 
delivery latency)? 

C. Our Contributions 

We propose a new technique that aids in network QoS-aware 
selection of cloud services for provisioning mobile (or device 
with internet access but limited processing capability and 
storage), real-time and interactive applications. We build upon 
our previous work ED where we have developed an automated 
approach, along with a unified domain model capable of 
fully describing infrastructure services in Cloud computing 
1201 ED- While our previous approach supports simple cloud 
infrastructure service selection based on declarative Structured 
Query Language (SQL), it does not take into account real-time, 
variable network QoS constraints. Furthermore, a declarative 
SQL-based selection approach only allows users to compare 
and select a cloud service based on a single criterion (e.g. 
total cost, max size limit for storage, memory size for com¬ 
pute instance). In other words, our previous approach was 
not capable of supporting a utility function that combines 
multiple selection criteria pertaining to storage, compute, and 
network services. In this paper, we make following concrete 
contributions: 

1. Problem Formulation. We provide a clear formulation of 
the research problem by identifying the most important cloud 
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TABLE I 

A BRIEF COMPARISON OF THE CLOUD RECOMMENDER WITH OTHER EXISTING SOLUTIONS 


Feature 

Product 

QoS 

Benchmark 

SingleCriteria 

Comparison 

AggregateRanking 

^Comparison 

Cloud 

Management 

Broker @ Cloud 

No evidence on progress of project 

Yuruware 

No 

No 

No 

Yes 

CloudHarmony 

Adjustable 

No 

No 

No 

Cloudorado 

No 

Yes 

No 

No 

CloudBroker 

Adjustable 

Yes 

No 

No 

CloudRecommender 

Fixed 

Yes 

Yes 

No 


service selection criteria relevant to specific real-time QoS- 
driven applications, selection objectives, and cloud service 
alternatives. 

2. Multi-criteria QoS Optimization. We adopt and imple¬ 
ment an Analytic Hierarchy Process (AHP) based decision 
(service selection) making technique that handles multiple 
quantitative (i.e. numeric) as well as qualitative (descriptive, 
non numeric, like location, CPU architecture: 32 or 64 bit, 
operating system) QoS criteria. AHP determines the relative 
importance of criteria to each user by conducting pair-wise 
comparisons. 

3. Network-aware QoS Computation. We implement a 
generic service that helps in collecting network QoS values 
from different points on the Internet (modeling big data source 
location) to the cloud data centers. 

The paper is structured as follows. In section [II] we survey 
the state-of-the-art in Cloud Service Selection and Compar¬ 
ison (CSSC) techniques. We also highlight their significant 
limitations, their relationship and dependency on some of the 
prior concepts from other fields in computing. In Section [III| 
we present the extension we made to our previously proposed 
decision making framework. We also explain the benefits of 
applying AHP and importance of considering QoS. In section 
we present evaluations (conducted in real-world context) of 
proposed decision support tool and techniques, which will 
automate and map users’ specified application requirements 
to specific Cloud service configurations. In section [Vj we 
conclude and point out open research questions and future 
directions in this increasingly important area. 

II. BACKGROUND AND RELATED WORK 

Though branded calculators are available from individual 
cloud providers, such as Amazon [22j and Azure (23) for 
calculating service leasing cost, it is not easy for users to 
generalize their requirements to fit different service offers 
(with various quota and limitations), let alone computing and 
comparing costs. A number of research (24) and commercial 
projects (mostly in their early stages) provide simple cost 
calculation or benchmarking and status monitoring, but none 
is capable to consolidate all aspects and provide a com¬ 
prehensive ranking of infrastructure services. For instance, 
CloudHarmony ED provides up-to-date benchmark results 
without considering cost, Cloudorado [26 ] calculates the price 
of IaaS-level CPU services based on static features (e.g., pro¬ 
cessor type, processor speed, I/O capacity, etc.) while ignoring 
dynamic QoS features (e.g. latency, throughput etc.). Yuruware 


E3 used to provide a Compare service during beta version 
in 2012 (now removed or integrated into another service). Al¬ 
though they aim to provide an integrated tool with monitoring 
and deploying capabilities, it is still under development. One 
other similar system is Swinburne University’s Smart Cloud 
Broker Service (28), from the screencast they released, we 
can tell that their benchmarking is done in real-time which 
means users have to wait for the results to come back. We have 
considered this kind of situations, but decided to collect the 
benchmarking result beforehand. Because this way no matter 
how many cloud providers users want to compare against, they 
can still get the result with minimum (or no) waiting time. 
Another reason we choose to do it this way is because, at any 
particular point in time, the network benchmark result is not 
conclusive as performance fluctuates during time, so we use 
aggregated average which is a more reliable overall indication. 

To further distinguish ourselves from others, we offer the 
following two innovative features when ranking, selecting, and 
comparing various vendor services: 1) allow users to choose 
to include the QoS requirements during comparison; 2) when 
users want to take into account mixed qualitative (e.g. hosting 
region, operating system type) and quantitative criteria, we 
apply the Analytic Hierarchy Process (AHP) to aggregate nu¬ 
merical measurements and non numerical evaluation. Results 
are personalized according to each user’s preferences, because 
AHP takes users’ perceived relative importance of criteria 
(pair-wise comparisons) as inputs. 

Table [I] shows a brief comparison of the CloudRecommender 
with other existing products we mentioned previously. We have 
to clarify that we are more interested in the first 3 features. 
Yuruware had claimed to have comparison features in the past, 
but removed later. 

Menzel and Ranjan (29) introduced a framework called 
“CloudGenius” that supports decision making process on web 
server migration into the cloud. Our system supplements and 
partially extends their work. While “CloudGenius” focus on 
Virtual Machine (VM) selection, means it considers the soft¬ 
ware requirements (i.e. operating system version, supported 
languages), our study focus more on the hardware require¬ 
ments (i.e. size of memory and hard disk). Although we have 
borrowed the idea of using the AHP (with simplification) for 
rank calculation from “CloudGenius”, we used it differently, 
as we applied the method in our declarative program which 
mainly handles data and calculation with database and SQL. 
That means it may be easier to scale out the solution using 
Hive (30l with minimal change, as suppose to rewrite the java 
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code to fit the Map Reduce Framework ED. 

Queuing theory is one of the much studied method in 
QoS modeling and control from the infrastructure system 
administrator perspective [32J] but our case is different, be¬ 
cause we have no control of the infrastructure. Since we can 
only measure the QoS, we collected the statistics using the 
“speedtest” service provided by CloudHarmony due to easy 
adoption and ever evolving nature of this service. Klein et al. 
El proposed a highly theoretical model based on Euclidean 
distance for estimating latency, which we believe have omitted 
too much details to be practically accurate. However, we can 
use this model to estimate latency when QoS data is not 
available for a new client location. 

There are methods proposed for network aware service 
composition El E3 E3 considering generic web service, 
i.e. at the Software-as-a-Service (SaaS) and Platform-as-a- 
Service (PaaS) level. But the compatibility constrains at the 
IaaS level are different from web service. For example, generic 
web services are distinguished by their features, QoS and 
prices. It does not make sense to include 2 exact same services 
in one composition as one job does not need to be done twice, 
but using multiple quantity of an IaaS offer is perfectly valid. 


TABLE II 

SYMBOLS USED IN THE FORMULAS 


Symbol 

Meaning 

a 

Resource usage behave like a decision variable. 

C 

Set of all possible Cloud providers. 

c 

Cloud Provider, e.g. Amazon,Rackspace, GoGrid. 

D 

Downloading speed. 

i 

Identifies a request. 

L 

Set of all possible datacenter locations. 

1 

A datacenter location, e.g. Sydney, Tokyo. 

c 

Latency (download). 

M 

Memory Size (e.g. 8G). 

P 

Price 

R 

Set of all possible resources, including all types 
whether it is Compute, Storage or Network. 

r 

Identifies a source, e.g. GoGrid XX - Large Instance, 
S3 Storage Serive, EC2 instance. 

7 

Set of Requests from one user. 

S 

Storage. 

T 

Period of time the resource is used. 

t 

Exact point in time, like a time stamp. 

U 

CPU speed. 

n 

Uploading speed. 

w 

Weight. 


III. System Design 

This section will describe our system’s architecture and give 
details on how it’s realised, i.e. formulas on how weight, rat¬ 
ing, cost are calculated. We keep all the formulas in subsection 


III-A then we show where/in which step different formulas are 


applied and how relates to each other in subsection |III-B| In 
the last subsection, we provide illustrations of overall system 
design and include any worth mentioning details that does not 
fit into the previous subsections. 


A. Formal Model 

To give a conceptual explanation of our approach to address 
the QoS optimization problem, we define a formal model in 


this section. Based on the formal model, we can describe the 
involved concepts that are incorporated in the algorithm pre¬ 
sented later. Particularly, we define a cost estimation function 
using resource utilization estimations, and a benefit-cost ratio- 
based evaluation function which considers weights. Further¬ 
more, we present a pair-wise comparison method to calculate 
normalized weights. For more precise resource utilization 
estimations, we show how variable resource utilization patterns 
can be incorporated into cost estimation. 

1) Cost Estimation: Fet “a” be the resource usage of a 
particular resource from a data center location of a Cloud 
provider. For example, we can use a storage , an y, any = 50 GB 
to represent user’s need to store 50 GB of data in the cloud. 
The symbols’ meanings are summarized in Table [Tl) Equation 
[D means the usage of the compute resource r from provider 
c at location l is between 0 and n. This value is usually 
suggested by users. Our assumption is that users may have 
a rough estimate of how much resources they might need. 

&r,c,l C {0, 1, . . . , 77/} (1) 

To calculate the Cost (represented by function: p) for one kind 
of resource used at one point in time, we multiply its usage 
with the corresponding unit price (P) as: 

pif) = t^r,c,lPr,c,l ( 2 ) 

After initial filtering on which options are appropriate for 
users, we can calculate the total (minimum) price per unit 
time for desired resource(s) (assume constant resource usage 
pattern throughout the time) as in formula [3] We assume users 
will choose the time period (T) they want to estimate price for, 
e.g. 1 hour, 30 days. 

Or,c,lPr,c,lPr,c,l (3) 

2) Cost Benefit Ratio: In our decision making framework, 
we consider the following QoS statistics: download latency (£), 
download speed (D) and upload speed (fi). Those character¬ 
istics are important for end-users experience and satisfaction. 
It’s possible to have options that have small price difference, 
or when having high quality service is more important than 
saving money. So we offer to calculate the cost/benefit ratio 
for the resources requested as in equation [4] 

'Rfi ^2 0 * 0,1 ,rPc,l,rTc,l,r T~ ^2 Cc,l,r 
^3 Fc,l,r T~ ^4 Dc,l,r 

Since users are likely to select a combination of compute 
storage and network services, hence the summation over 
resources when calculating the cost. 

Note that the network QoS of Compute and Storage Service 
are both collected then separately stored, since user maybe 
only interested in one of the services. For example, transferring 
files from (and to) the compute instance relatively “local” 
mounted storage is different from downloading or uploading 
files from/to dedicated storage only service (like AWS S3 
Ell). In case user select both, we use the average. For 
instance, in the equation we used D to denote that we take 
the average of D cornpute (download speed measured from the 
Compute service) and D storage (download speed measured 
from the Storage service). 
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(a) Criteria to maximize 


Cost 



TABLE III 

ABSOLUTE VALUE AND CORRESPONDING DESCRIPTIVE 
SCALE REPRESENTING RELATIVE IMPORTANCE 


Scale 

Value 

Reciprocals* 

equal 

1 

1 

moderate 

3 

1/3 

strong 

5 

1/5 

very strong 

7 

1/7 

extreme 

9 

1/9 


*If activity i has one of the above nonzero numbers assigned to it when 
compared with activity j, then j has the reciprocal value when compared 
with i. 


TABLE IV 

SYMBOLS USED IN WEIGHT EXPLANATION 


Symbol 

Meaning 

T 

n=4 

V Vn 
n = 1 

V 

Value given by User to rate the importance. 

Vcompute dif . k 

How important is the size of disk space on VM. 

Vcost 

Importance value for cost. 

F latency 

Importance value for Download Latency. 

Vram 

How important is the size of memory allocated to VM. 

F r speed up i oad 

Importance value for Upload Speed. 

^speedfj n ^ nr ,i nn d 

Importance value for Download Speed. 

X 

Some user input value. 

y 

Sum of the row values. 

yi 

(!*-) +i 

V2 

/ n=5 \ 


(b) Criteria to minimize 

Fig. L Criteria taken into consideration during comparison. There are 2 
categories: benefit and cost. “Benefit” groups the “good” criteria which are 
meant to be maximized. Similarly, “Cost” groups the “bad” criteria to be 
minimized. The actual values to be collected and stored are at the “leaf’ (i.e. 
Node/criterion with no children) of the “tree”. For example, under “Benefit”, 
numeric values are collected for “Download/Upload Speed”, “CPU Speed” 
and “Number of Cores”. “QoS to Maximize” is the parent/big category 
“Download/Upload Speed” belongs to, there is no value stored for this node. 


The fully fledged AHP method consists of repeated matrix 
squaring to compute the eigenvector, see [6j every time the 
eigenvector gain a tiny improvement on precision at the cost 
of expensive computation, this is supposed to be repeated until 
no big enough difference (i.e. to four decimal places) can be 
observed. In our case, we noticed that the improvement is so 
small that this rule can be relaxed to omit iterations on matrix 
squaring. 


Symbol w represents the weight, which measures users’ 
perceived importance on a parameter, and w\ + — 1 and 

W 3 +W 4 = 1 means the sum of the weights of benefits and cost 
each equals to one. Fig. [I] shows the criteria to be optimized. 
They are categorized into two groups: to be maximized or to 
be minimized. 

As we named this ratio “Cost Benefit Ratio”, we put cost on 
the numerator and benefit in the denominator. As a result we 
will be looking for smaller ratio as better option. Reversing 
numerator and denominator can still work, just means bigger 
ratios indicating better option. 

3) Weight computed by Pairwise Comparison: The weight 
is calculated based on AHP’s pair wise comparison method. 
We choose the commonly used scale (38l [39] shown in Table 
[m| In case user chooses to treat all options equally, (4) become 
(5). 

0-5 T: Q'r.c.jPr.c.lI'r.c.l T~ 0-5Cc,Z,r 

0-5/J-c,z,r T~ 0.5D c? z ?r 

Otherwise, weight is calculated as shown in Table |V| on 
page [6] The meaning of symbols is explained in Table [IV| 


Vi/t 
2/2/r 
2/3 It 
2/4/t 


( 6 ) 


For example, user may have preference like shown in Table 


VI It will produce the preference matrix M\ 


Mi 


'1 

1/3 

1/5 

1/5 

3 

1 

3 

5 

5 

1/3 

1 

3 

5 

1/5 

1/3 

1 


TABLE VI 

EXAMPLE USER PREFERENCE 


F speed U pi oa( i 

F speed download 
Vram 

Vcompute disk 


Vspeed upU 

1 


,d Vspeed down i oad VravnVcompute disk 

1/3 1/5 1/5 

1 3 5 

1 3 

1 
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TABLE V 

MATRIX ILLUSTRATING HOW TO TURN PAIR-WISE PREFERENCE INTO GLOBAL WEIGHT 



Vspeed up i oad 

Vspeed downh 

oad Vram 

V CO mpute disk 

Row Sum 

Weight 

k speed U pi oac i 

1 

Xi 

X2 

X 3 

y l 

Vi/x 

k speed down i oad 

1/xi 

1 

X4 

X5 

V2 

V2/r 

Vram 

1/X 2 

1/X4 

1 

X6 

V3 

yz/r 

Vcompute disk 

l/x 3 

l/x 5 

1/X 6 

1 

V4 

y^/r 





Column Sum 

T 

1 


Table |VII| shows the steps breakdown to compute 
eigenvector from [7] before matrix squaring. 


TABLE VII 

EXAMPLE EIGENVECTOR CALCULATION 


Row Sum 


1 

+ 

0.3333 

+ 

0.2 

+ 

0.2 

= 

1.7333 

3 

+ 

1 

+ 

3 

+ 

5 

= 

13 

5 

+ 

0.3333 

+ 

1 

+ 

3 

= 

9.3333 

5 

+ 

0.2 

+ 

0.3333 

+ 

1 

= 

6.5333 


Column Sum 30.5999 


The result eigenvector would be: 


0.0566 
0.4248 
Vl ~ 0.3050 

_ 0.2135 _ 

If we square the matrix M\ we get: 


M 2 


4 

58/75 

22/15 

8/3' 

46 

4 

124/15 

98/5 

26 

44/15 

4 

26/3 

184/15 

98/45 

34/15 

4 


Mi x Mi 


( 8 ) 


(9) 


TABLE VIII 

SYMBOLS USED IN ALGORITHM 


Symbol 

Meaning 

AvgQoS 

Table/Relation contains the QoS data collected 

D compute 

Download speed from the compute instance 

D storage 

Download speed from pure storage, i.e. S3 

D 

Average download speed calculated as: 

2 (-^compute T D s t ora g e ) 

£ 

£ C L Some set of locations which are specified by 
the user, by default £ = L, which means consider 
all locations available. 

M m in 

Minimum memory requirements of compute in¬ 
stance/server 

price max 

The maximum price one is willing to spend 

p 

pCC Some set of Cloud providers which are 
specified by the user; by default p C C, which 
means consider all locations available. 

compute 

Table/Relation contains all data collected about 
Compute resources. 

H network 

Table/Relation contains all data collected about Net¬ 
work resource. 

H storage 

Table/Relation contains all data collected about Stor¬ 
age resource. 

h 

Average upload speed, similar to D 

u 

A tuple representing the estimated usages 
provided by user, containing the following: 

(U compute, U storage, C da t air} , U da ta ou t ) 

w 

A tuple representing the preference/weight given 
to each component by the user, it consists of 
the following: (yVcompute->W s torage-,Wnetwork ? 
Ik download , ^^upload, Ik latency ) 


The eigenvector calculated from M 2 is: 


0.0597 

0.5223 

0.279 

0.1389 


( 10 ) 


The change of value in the new eigenvector is vary small, 
hence why we decide to omit this step and just use the original 
weight values (ui). And we assume the preference for cost and 
latency are 0.8 and 0.2, so we can calculate the overall rank 
as shown in equation 11 : 


(0.8 ^ a c ,i, r P c ,i, r T c ,p r + 0.2 

(0.05 66ft c ,i,r + 0.4248Z) c? / ?r 
+ 0.3050 y] M C)lr + 0.2135 ^ 5 Cii , r ) _1 (11) 

Where M represents memory size and S is storage size. 


TABLE IX 

SYMBOLS USED IN ALGORITHM: RELATIONAL ALGEBRA AND 
SET OPERATIONS 


Symbol 

Meaning 

G 

Aggregation operation over a schema, like a 
group by clause in SQL. It follows the format: 

a g G a gg _op(attri ) M where iS the g rou P in § at ‘ 
tribute. agg_op(attri) is the aggregation operation 
over attribute (attri). There are five aggregate func¬ 
tions that are included with most relational database 
systems. These operations are Sum, Count, Average, 
Maximum and Minimum, r is an arbitrary relation. 
See relationa algebra wiki page |40] for more details. 

(7 

Selection, see |40j. 


Natural join: depends on the condition can 
be either 0-join or equijoin. For example, ixi 
(Provider, Location) means equijoin where the 
condition is join only under the same provider and 
location 

u 

Set union operation. 

i —y 

Ordered pair, here we use it to denote a new record 
being formed. 


B. Algorithm 

It’s more likely that users choose to use a single provider to 
eliminate costly cross-provider data transfer, but others may 
have the need to use multiple providers to achieve greater 
coverage and disaster resilience. 


We have abstract our approach in Algorithm 1. Most of the 
symbols can be found in Table |VIII| and |lXj some symbols 
are defined earlier in Table [II] and |IV| We have separated the 
relational algebra and set operations into Table [IX] please pay 
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Algorithm 1: orderedSolutions (£, M m i n , przce ma x, p, U, W) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 


//Filtering on the static characteristics 

compute '■ = °'providerEpAlocationE:iAmemory>M inin compute ) 

//Link it with QoS statistics. 

*pcompute compute&^provider,location,serviceNameAvgQoS 

^storage '• ^providerEpAlocationEiAquotai ow <vstorage storage ) 

//Calculating storage price for each tier. 
storageCostByQuota := emptylist 
foreach G ^storage do 

if quota min (C s) < U storage 
foreach ( s G ^storage do 

| storageCostByQuota := storageCostByQuota U {£ s h 
else 

j storageCostByQuota := storageCostByQuota U 

end 
end 
end 

//Combining storage cost in different tiers to get total. 
storageCost := G 

servicejnamelkprovider sum(storage_ cos t ) 

^Pstorage •— storageC ost\><\p rov ider,location,serviceN ameAvgQoS 
^network - = ^provider £pAlocationEiAquotai ow <v s toragei'^uetwork > ) 

//Match appropriate Compute Storage and Network options. 

• *p compute^^pr ovider,location,location c n eri t ^Pstorage&^provider,location^network 
totalCost := empty list 
foreach £ G ip do 
| totalCost U {C i->- U r P r } 
end 

ranked := empty list 
foreach £ G totalCost do 

I ranked l i IP ^ oa d + DW d 

ownload 1 

| ranked U jC ^ cw Iotenci/ +£ VL r P r / 

end 

return sortOnRankDescending(ranked) 


■ gnofa max (Cs)} * unit Pricers ) 

{U storage ~ quota m i n (Cs))} * unitPrice(Cs) 

(. storageCostByQuota ) 


attention to operation G as it has multiple inputs represented 
by superscript and subscripts. 

Algorithm 1 only depicts one common use case, other 
scenario exists but can be solved with a simplified version 
of Algorithm 1 or with small modification/addition. We will 
explains these situation in the following paragraph. 

As shown in Algorithm 1, a user can provide us the 
following inputs (^, M min ,price max , p, [/, W). £ is the set 
of locations that a user wants to consider, by default we 
consider all locations. M min is the minimal memory require¬ 
ments for the VMs, 0 denotes no memory requirements. 
pr7ce max is the maximium budget user willing to spend, 
0 indicating they are only interested in free services, -1 is 
used to represent infinity which means there is no budget 
constrains, p is the set of Cloud service providers that a 
user wants to consider, by default we consider all providers. 
V represents the the estimated usages of all the resources: 
(UcomputeiUstorageiUdatainiUdcitcLout) m Ucompute the num¬ 
ber of instances, U st0 rage is the number of GB of storage 
will be used. Udata ou t i s th e amount of outward data transfer 
in GB from cloud provider to end devices/users. Similarly, 
Udatai n represents the amount of inward data transfer. All the 
previously mentioned usage estimations are all monthly based, 
but other length can be used such as daily or hourly, as long as 
all resource are calculated based on the same standard, there 
should be no effect on the final comparison and ordering. W 
represents a user’s preference, details are explained in section 
IIII-A2I and HlhA3l 

Once options satisfy user requirements have been identified, 
we calculating price according to different model. There are 


various pricing models m exist, for example, free, flat- 
rate, two-part tariffs (like the AWS reserved instance), block- 
declining (S3 storage), bidding (AWS spot instance). They can 
mostly be incorporated into our model except the bidding type. 
One provider often have multiple offers within the same type 
of services, for example, different kind of instances for the 
compute service, different storage options, we combine them 
to get a combinatorial number of choices, we do that for all 
providers, then calculate the summed cost and rank for each 
combined option. Not all users need all 3 types of resources, if 
they specify 0 for a type of resource, it will not be considered. 
But network service is always needed. 

C. Implementation 

Fig. [2] shows the top level dataflow of the system we 
implemented. Data is initially collected from web page by 
profiler nodes, we use the HtmlUnit library (42). The whole 
system consists of multiple agents at geographically dispersed 
locations to collect and process data, shown in Fig [3] If we 
look at individual slave node, we can see every node profiles 
the QoS statistics to various Clouds from each location. 
Bashed scripts are written to export data from each node. 
Master node pulls data from its children nodes, access keys 
are required for this operation. Then the CSV formatted data 
is imported to the master database, where appropriated merge 
operation is performed. 

Fig. [4] shows the overview of our system architecture. We 
use Dropbox for this prototype implementation to demonstrate 
the feasibility of our innovation. As long as data is properly 
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(scheduled) 


Fig. 2. Abstract System Dataflow. This figure is better looking together with 
figure 3 for better understanding. As we have used several (slave) servers 
to collect data from different locations. Then we transfer them to a central 
server for processing and backup, data on this server was also archived and 
cleared manually every time after we imported the newly collected data into 
the local offline system for post-processing and cleaning up. We only use the 
(summarised) average QoS data for real time querying via API and web GUI, 
as this allows us to provide response faster. 



Fig. 3. QoS Monitoring Service Network Topology. We have used 2 Clouds 
namely: Nectar Research Cloud and Amazon Web Service. Since Nectar 
Cloud is free for researchers, we kept the instances running all the time, 
hence the decision to put master in Nectar. Because there is a limit of 
quota in Nectar and Amazon have greater geographical coverage in terms of 
datacenter locations. We use additional Spot instance from Amazon as slave 
data crawlers. A QoS Monitoring Node profiles Download Speed, Latency and 
Upload Speed at each datacenter in various Clouds from different locations. 


backed up in a separate location, other mechanisms can be 
used. 

The price data is collected from providers’ websites. The 
problem with automatic data collection can be solved if 
providers release more structured data with sufficient metadata 
description, we have proposed an ontology in previous work 

E2B. 

Initilally, the QoS data was collected every 2 hours by 
running the “speedtest” service of CloudHarmony. A single 
run takes more than an hour to finish hence we are collecting 
it at maximum possible granularity. Later by analyzing the 
data, we conclude that such high frequency is not necessary, 
as the average QoS from a particular location to a particular 


Master node 






Usage Estimation 


Data aggregation: 
Price Location QoS 


Price calculation 

Ranking comparison 


Requirements 

matching 

and 

Constraint 

Filtering 


Weight computation 
and AHP 





Get latest 
currency rate 


Other scheduled tasks 


Other data 
harvesting: 
price offers 


Reasoning module 




MySQL 

Storage and backup DropBox 


Fig. 4. Master Node System Architecture. In the reasoning module main 
functions and operations are broke down into different blocks. There are some 
other tasks cannot be strictly categorized into existing modules, those are put 
into the “Other Tasks” section, and the very light grey block contains the 
evolving part of the system so it cannot be considered a stable component of 
the system. While its possible to backup the whole server, it is not necessary 
at this stage, and the most valuable data is stored in the MySQL database, 
which can be backuped much easier and cheaper by creating “SQL dump”. 
This dump file is created daily and simply stored in a Dropbox folder which 
is free to use and keeps a history of the file stored in it for 30 days, which is 
sufficient for our case. The presentation layer (UI and API implementation) 
and monitoring module are omitted to keep the diagram simple. 


data center most of the time fluctuating between a resealable 
range. That means the average would be pretty stable. We 
can use the historical data as a pretty reliable indication. Note 
that difference between datacenters and various locations are 
still huge as expected, see Fig. 5. In the future we may allow 
a combination of real time and off-line values to be used if 
necessary. 


IV. Experiment 

A. Setup 

We run our system and proposed algorithmic technique 
across a range of hardware systems to understand the implica¬ 
tion of hardware resource configuration (see Table [X]) on the 
performance of the approach. 

To summarize, Environment 1 is the local machine used 
during the development of the program, which is capable of 
running the database and other system modules. 

Environment 2 is the server from The National eResearch 
Collaboration Tools and Resources (NeCTAR) cloud (43] 
where the our system can be deployed as a service which 
is easily accessible over the Internet. It is a virtualized envi¬ 
ronment, so the CPU speed labeled may not accurately reflect 
the actual allocation. NeCTAR’s infrastructures are located at 
at eight different organisations (node sites) around Australia. 
It operates as one cloud system under the Openstack frame¬ 
work. This makes it having different UI and API compare to 
AWS. Being a collaborative research cloud, it’s only open to 
affiliated members (i.e. Australian researchers, students from 
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TABLE X 

EXPERIMENT ENVIRONMENTS 


Environment 

Description 

Processor Speed 

Memory 

Processor Name 

Role 

1 

MacBook Air Physical machine 

1.4 GHz 

2 GB 

Intel Core 2 Duo 

Master 

2 

Ubuntu 12.04.3 LTS instance in 
a virtualized environment 

2.4 GHz (IvCPU) 

4 GB 

AMD Opteron(TM) 
Processor 6234 

Master/Profiler 

3 

Standard Small (ml.small) 
Linux/UNIX EC2 Spot Instance 

1.79 GHz (lECU/vCPU) 

1.7 GB 

Intel(R) Xeon(R) 

CPU E5 - 2650 

Profiler 

4 

Compute Optimized 
(c3.8xlarge) Linux/UNIX EC2 

Spot Instance 

2.8 GHz (32 vCPU 1081 ECU) 

60 GB 

Intel(R) Xeon(R) 

CPU E5 - 2680v2 

Performance Testing 


30 


25 


20 


15 


10 



Sydney Singapore Tokyo California Oregon, US Brazil Virginia Ashburn, Dublin, 

West VA Ireland 

■ Avg Mb/s 

Fig. 5. Download speed from Amazon data centers to Melbourne 

participating university). Although the access is free, there is 
a limitation of 2 instance per member and a cap on the total 
resource usage. 

Environment 3 is the spot instance type (from Amazon) we 
used to collect QoS statistics from additional locations, but to 
cut down the cost; we kept the usage minimal. 

Environment 4 is the compute optimized spot instance type 
we used to test program performance under a powerful CPU, 
or vertical scalability in short. 

B. Network QoS Data 

Figure [5] shows that geographically close data center has 
(as high as 25 times) better network performance, hence 
this validates the fact that location is one of the important 
criteria which should be considered during selection process. 
Our measurements also indicate that distance is not the only 
factor that effects the network performance, as shown in Fig. 
[6j data centers are ordered from closest to furthest from 
left to right, Tokyo and Brazil clearly perform poorly than 
expected. Hence, we consider the need for active probing and 
profiling of network QoS from user’s endpoint connection to 
the cloud data centers. By doing so we get clear picture of 
data centre’s network QoS from the users’ device that may be 


1.2 


1 


0.8 

i - 

5 


5 

o 

o 

0.4 


0.2 


0 


Fig. 6. Download speed against distance. 

deployed across topologically distributed network locations. 
Note that we have left out Sydney from Fig. [6] on purpose. 
Fig 5 shows the exponential increase in speed between Sydney 
and Melbourne compare to overseas locations, while Fig 6 
shows the linear relationship between downloading speed and 
distance among overseas locations. We are aware that while 
it is generally true that the geographical distance between 
any pair of servers (or users) on the Internet affects the 
route trip time (RTT), the bandwidth between them is not 
necessarily determined by the distance, many other aspects can 
affect the user end QoS, like the last-mile home-connecting 
technology, local Internet traffic condition. Our measurements 
are only providing suggestive base for further optimisation, 
user’s actual experience will vary. 

C. Case Study 

1) Input Parameters: Table 0 shows the primary config¬ 
urable parameters of our algorithm. Everyone’s requirements 
regarding the compulsory parameters usually vary. So we 
choose a range of values to mimic different selection scenarios. 
In future work, we may conduct user survey to understand the 
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TABLE XI 

INPUT PARAMETERS 


Compulsory 

Example Value 

Storage(GB/30 Days) 

20 

Outbound Data Transfer(GB/30 Days) 

50 

Min RAM(GB) 

4 

Optional 

Default Value 

Provider Brand 

Consider All 

Display Currency 

AUD 

Number of Hours to run (per Month) 

720 

Number of Instance needed (per Month) 

1 

Inbound Data Transfer(GB/30 Days) 

1 

Weight of Compute Cost(percentile) 

35% 

Weight of Storage Cost(percentile) 

25% 

Weight of Network Cost(percentile) 

35% 

Weight of Latency(percentile) 

5% 

Weight of Download Speed(percentile) 

70% 

Weight of Upload Speed(percentile) 

30% 

Max RAM(GB) 

100% 


most concerned factors for different type of users, for example 
we can exposed all possible constrainable parameters via the 
API but it may not be necessary (not to mention also slows 
down the processing) and it will only overwhelm the users 
who only uses the visual interface. Optional parameters are 
the one tend to be hard to specify (especially for users with 
less technical background). Default value column shows what 
we use when not specified. 


2) Results: Figure [7] shows the top 5% of the result we 
get from the inputs in Table XI It is in ascending order of 
ratio (cost over benefit) as indicated by the dotted (blue) line, 
because lower cost over higher benefit gives us a smaller ratio 
which representing a better choice. If we look at ranking by 
considering only the cost, as illustrated by the solid (red) 
line, the GoGrid offers dominate over Windows offerings. 
If to order results in ascending price order (means network 
QoS constraints are not considered), shown in Fig. [8] Azure 
disappears from the top 10% of choices. Similarly, we can 
see that although the price change is small in solutions, their 
overall rankings are greatly different (dotted blue line). What 
this means to users is that while we can save money by 
ignoring network QoS but then they should be ready for 
degraded network performance Note that although we tried 
out best in using real world data, sometimes cloud providers 
vary their prices as frequent as weekly. However, in future 
work we intend to implement a price crawler service that will 
automatically parse the provider’s web pages and update our 
system’s database. 


3) Performance: The average run time for our current 
solution is about 11 seconds, with cache turned on in MySQL, 
we get up to 9% improvement on the same query. As the 
constraints become stricter the solution space reduces, as a 
result processing time decreases (to as low as 4.97 seconds), 
see Table IXTTI 


The performance increase observed when we move from 
environments 1 to 2 then 4 is resulted from an increase 
of processing power, hence the idea of “scale up”. There 
is a limit to the amount of processing power one core can 
have, but our solution is single threaded at the moment, there 
is still room for improvement by utilizing all cores (like 


environment 4). In the future we will explore the option of 
configuring MySQL/InnoDB to use multithreads (Default is 
4 and maximum is 64 since MySQL 5.1.38). Then we will 
decide whether we need to “scale out”. 


D. Computational Complexity 

We define the upper bound computational complexity of our 
optimization approach as: 

O (|i?| x \c\ X \L\ + (12) 

In cost estimation, we have to calculate prices for \R\ re¬ 
sources, \C\ providers, and \L\ geographical locations. In case 
a more complex utilization function is given, the computational 
complexity may increase. 

In our current model, we consider |v|= 6, see weight 
calculation in Table [V] on page [6] Hence to determine the 
weights in the benefit-cost ratio evaluation function, 15 pair¬ 
wise comparisons have to be made, unless user choose to use 
the default values. In both cases this part of the complexity 
factor is a constant which can be omitted. 


V. CONCLUSION AND FUTURE WORK 

The cloud has great potential for a large variety of users 
with diverse needs, but the selection of a the right provider is 
crucial to this end. Aiming to eliminate potential bottlenecks 
that limit the ability of general users to take advantage of cloud 
computing, we present an improved system (which extended 
out previous work) that further allows user to make multi¬ 
criteria selection and comparison on IaaS offers considering 
QoS. We hope our research will drive even greater adoption 
of the cloud and boost the expansion of the cloud hosted 
applications. Furthermore, the system we are proposing will 
also benefit the Cloud provider, by providing analyses of the 
market and demand, our system can potentially recommend 
what price the providers can set their service to. 

In the future, we would like to provide smarter decision 
support by including SLA, legal compliance m into consid¬ 
eration. We are also improving the data gathering and updating 
mechanism. Furthermore, we plan to conduct our experiments 
on network QoS data collected in real-time rather than based 
on archived QoS (as done in this paper). This will allow 
us to analyze performance of the proposed technique under 
uncertainties such as network congestion and network link 
failures. There are also other interesting ordinal optimization 
based techniques (45l . ll46l worth looking at. 
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Fig. 7. Results in ascending order by (cost / benefit) ratio 



Fig. 8. Results in ascending order by cost 
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Test 

Number 

Storage (GB/ 
30Days) 

OutboundData 
Transfer(GB/30Days) 

Min 

RAM(GB) 

Row(s) 

Enviroment 1 

Enviroment 2 

Enviroment 4 

1 

20 

10 

0 

3808 

12.04 

11.07 

10.96 

2 

40 

15 

0 

3808 

11.913 

11.59 

7.81 

3 

10 

2 

0 

3808 

11.169 

10.76 

7.05 

4 

20 

2 

0 

3808 

11.744 

11.15 

7.57 

5 

200 

200 

0 

3808 

11.894 

11.72 

7.49 

6 

200 

200 

0 

3808 

11.912 

10.85 

6.76 

7 

200 

200 

16 

552 

9.15 

7.7 

4.97 

8 

200 

200 

8 

1524 

9.644 

9.69 

5.53 

9 

200 

200 

4 

2095 

10.25 

8.72 

5.58 

10 

20 

20 

0 

3808 

12.06 

11.51 

7.03 

Average 

11.1776 

10.476 

7.075 
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